An Improved Approach for Caption Based Image Web Crawler
نویسندگان
چکیده
The World Wide Web [1] is a global, read-write information space. Text documents, images, multimedia and many other items of information, referred to as resources, are identified by short, unique, global identifiers called Uniform Resource Identifiers so that each can be found, accessed and cross referenced in the simplest possible way. It is a vast reservoir of information provides an unrestricted access to large inexhaustible pool of information, present in the form of hypertext documents formatted using Hyper Text Markup Language (HTML). These documents contain hyperlinks to other documents.
منابع مشابه
Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search
Accessing images online is often difficult for users with vision impairments. This population relies on text descriptions of images that vary based on website authors’ accessibility practices. Where one author might provide a descriptive caption for an image, another might provide no caption for the same image, leading to inconsistent experiences. In this work, we present the Caption Crawler sy...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملMarie-4: A High-Recall, Self-Improving Web Crawler That Finds Images Using Captions
page text describes associated images, and images are not captioned consistently. Content-based image retrieval systems that analyze the images themselves1 are progressing, but the systems require considerable image-preprocessing time. Furthermore, surveys of users doing image retrieval show that users are more interested in the identification of objects and actions depicted by images than in t...
متن کاملThe Use of Object Labels and Spatial Prepositions as Keywords in a Web-Retrieval-Based Image Caption Generation System
In this paper, a retrieval-based caption generation system that searches the web for suitable image descriptions is studied. Google’s search-by-image is used to find potentially relevant web multimedia content for query images. Sentences are extracted from web pages and the likelihood of the descriptions is computed to select one sentence from the retrieved text documents. The search mechanism ...
متن کاملAn Improved Pixon-Based Approach for Image Segmentation
An improved pixon-based method is proposed in this paper for image segmentation. In thisapproach, a wavelet thresholding technique is initially applied on the image to reduce noise and toslightly smooth the image. This technique causes an image not to be oversegmented when the pixonbasedmethod is used. Indeed, the wavelet thresholding, as a pre-processing step, eliminates theunnecessary details...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012